EnrichedHeatmap 之花样玩法

Original JunJunLab 老俊俊的生信笔记 2022-08-15

收录于合集

#100天生信/数据科学自我挑战 100 个

#R包分享 93 个

我们都有各自的生活

1、多个热图

EnrichedHeatmap 包的强大之处在于，并行热图可以连接起来，用于富集热图、普通热图以及行注释，这提供了一种非常有效的方式来可视化多个信息源。

借助 ComplexHeatmap 包的功能，可以通过+运算符连接热图。Heatmap 对象和 HeatmapAnnotation 对象可以混合使用。

以下热图显示了 H3K4me3 修饰、甲基化和基因表达之间的对应关系。很容易看到高表达与 TSS 周围的低甲基化和高 H3K4me3 信号相关：

# 绘图
EnrichedHeatmap(mat1, col = col_fun, name = "H3K4me3",
                top_annotation = HeatmapAnnotation(enrich = anno_enriched(axis_param = list(side = "left")))) +
  EnrichedHeatmap(mat2, col = meth_col_fun, name = "methylation") +
  Heatmap(log2(rpkm+1), col = c("white", "orange"), name = "log2(rpkm+1)",
          show_row_names = FALSE, width = unit(5, "mm"))

当然，也可以在主热图中按 分类变量 或 k 均值聚类来拆分 行。在以下热图中，最右边的颜色条可以对应于组蛋白修饰热图和甲基化热图上列注释中的颜色。

在这里我们再次强调，对矩阵进行适当的修剪将极大地有助于揭示模式。可以尝试将 mat1 替换为未修剪的矩阵，看看下面显示的这种模式是否仍然保留：

# 聚类
partition = paste0("cluster", kmeans(mat1, centers = 3)$cluster)
# 图例
lgd = Legend(at = c("cluster1", "cluster2", "cluster3"), title = "Clusters",
             type = "lines", legend_gp = gpar(col = 2:4))
# 绘图
ht_list = Heatmap(partition, col = structure(2:4, names = paste0("cluster", 1:3)),
                  name = "partition", # 注释
                  show_row_names = FALSE, width = unit(3, "mm")) +
  # H3K4me3
  EnrichedHeatmap(mat1, col = col_fun, name = "H3K4me3",
                  top_annotation = HeatmapAnnotation(lines = anno_enriched(gp = gpar(col = 2:4))),
                  column_title = "H3K4me3") +
  # methylation
  EnrichedHeatmap(mat2, col = meth_col_fun, name = "methylation",
                  top_annotation = HeatmapAnnotation(lines = anno_enriched(gp = gpar(col = 2:4))),
                  column_title = "Methylation") +
  # rpkm
  Heatmap(log2(rpkm+1), col = c("white", "orange"), name = "log2(rpkm+1)",
          show_row_names = FALSE, width = unit(15, "mm"),
          top_annotation = HeatmapAnnotation(summary = anno_summary(gp = gpar(fill = 2:4),
                                                                    outline = FALSE, axis_param = list(side = "right"))))
# 绘制
draw(ht_list, split = partition, annotation_legend_list = list(lgd),
     ht_gap = unit(c(2, 8, 8), "mm"))

2、分开展示阳性和阴性信号

有时，我们将某些基因组目标周围的一般相关性或组差异可视化。在这种情况下，将阳性信号和阴性信号的富集分开可视化更有意义。在以下示例中，变量 mat_H3K4me1 包含 H3K4me1 信号与基因 TSS (-5kb, 10kb) 中基因表达之间的相关性：

# 加载数据
load(system.file("extdata", "H3K4me1_corr_normalize_to_tss.RData", package = "EnrichedHeatmap"))
mat_H3K4me1
Normalize to target:
  Upstream 5000 bp (100 windows)
  Downstream 10000 bp (200 windows)
  Include target regions (width = 1)
  677 target regions

在 anno_enriched() 中，gp 有两个非标准参数 neg_col 和 pos_col。如果设置了这两个，则分别为矩阵中的正信号和负信号绘制富集曲线：

# 设置颜色
corr_col_fun = colorRamp2(c(-1, 0, 1), c("darkgreen", "white", "red"))
# 绘图
EnrichedHeatmap(mat_H3K4me1, col = corr_col_fun, name = "corr_H3K4me1",
                top_annotation = HeatmapAnnotation(
                  enrich = anno_enriched(gp = gpar(neg_col = "darkgreen", pos_col = "red"),
                                         axis_param = list(side = "left"))
                ), column_title = "separate neg and pos") +
  EnrichedHeatmap(mat_H3K4me1, col = corr_col_fun, show_heatmap_legend = FALSE,
                  top_annotation = HeatmapAnnotation(enrich = anno_enriched(value = "abs_mean")),
                  column_title = "pool neg and pos")

3、继承 ComplexHeatmap 包的参数

由于 EnrichedHeatmap 建立在 ComplexHeatmap 包之上，因此 ComplexHeatmap 中的功能可以直接用于 EnrichedHeatmap。如前所述，热图可以通过 row_km 或 row_spilt 参数进行分割。

可以通过 row_order() 检索行的顺序：

ht_list = draw(ht_list)
row_order(ht_list)

由于 EnrichedHeatmap 和 EnrichedHeamtapList 类分别继承自 Heamtap 和 HeamtapList 类，因此后两个类中的所有高级参数都可以直接用于前两个类中。

例如：更改热图标题的图形设置：

EnrichedHeatmap(..., column_title_gp = ...)

更改图例：

EnrichedHeatmap(..., heatmap_legend_param = ...)
# or set is globally
ht_opt(...)
EnrichedHeatmap(...)
ht_opt(RESET = TRUE)

设置热图宽度：

EnrichedHeatmap(..., width = ...) + EnrichedHeatmap(...)

4、汇总多个数据

假设有一个不同样本的组蛋白修饰信号列表，并且你想要可视化跨样本的平均模式。你可以首先对每个样本的组蛋白标记信号 进行归一化，然后 计算所有样本的均值。在以下示例代码中，hm_gr_list 是包含组蛋白修饰位置的 GRanges 对象列表，tss 是包含基因 TSS 位置的 GRanges 对象：

# 批量归一化为矩阵
mat_list = NULL
for(i in seq_along(hm_gr_list)) {
    mat_list[[i]] = normalizeToMatrix(hm_gr_list[[i]], tss, value_column = ...)
}

如果我们将矩阵列表压缩为三维阵列，其中第一维度对应于基因，则第二维对应于窗口，并且第三维对应于样本，可以在第三维上计算所有样本的平均信号。getsignalsfromlist（）简化了这个操作。

将 getsignalsfromlist（）应用于 mat_list，它给出了一个新的归一化矩阵，其中包含所有样本的平均信号，可以直接用于 EnRichedHeatMap（）：

mat_mean = getSignalsFromList(mat_list)
EnrichedHeatmap(mat_mean)

也可以在阵列的第三维上计算组蛋白修饰和基因表达之间的相关性。在用户定义的函数 fun 中，x 是阵列中基因 i 和窗口 j 的向量，i 是当前基因的索引：

mat_corr = getSignalsFromList(mat_list, fun = function(x, i) cor(x, expr[i, ], method = "spearman"))

然后，这里的 mat_corr 可用于可视化基因表达与 TSS 周围的组蛋白修饰相关：

EnrichedHeatmap(mat_corr)

5、使用自己的矩阵

normalizetomatrix（）用于将基因组信号与目标之间的关联标准化。返回的值只是一个简单的矩阵，但附加了几个属性。有时，用户可能有自己的方法来生成这样的矩阵。很容易添加其它的属性并发送到 EnRichedHeamTap（）以进行可视化。

下面可以获取属性，基本上它们用于制作轴和标签：

attr(mat, "upstream_index")
attr(mat, "target_index")
attr(mat, "downstream_index")
attr(mat, "extend")

在以下代码中，mat2 是一个简单的矩阵，它只包含 dim 属性。mat2 可以被认为是从其他方法获得的矩阵：

mat1 = normalizeToMatrix(H3K4me3, tss, value_column = "coverage",
                         extend = 5000, mean_mode = "w0", w = 50)
mat2 = mat1
attributes(mat2) = NULL
dim(mat2) = dim(mat1)
mat2[1:4, 1:4]
     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    0    0
[3,]    0    0    0    0
[4,]    0    0    0    0

正如我们已经知道的那样，在 mat2 中，上游扩展到每 50bp 窗口的 5KB，这意味着前 100 个列对应于上游。类似于下游为最后 100 列。这里的目标是 TSS，可以认为没有宽度。因此，我们可以设置上游，目标和下游的列索引属性，如下所示：

attr(mat2, "upstream_index") = 1:100
attr(mat2, "target_index") = integer(0)
attr(mat2, "downstream_index") = 101:200
attr(mat2, "extend") = c(5000, 5000)  # it must be a vector of length two

并且不要忘记将 mat2 设置为 normalizedMatrix 类。现在 mat2 是 enrichedheamtap（）的有效对象：

class(mat2) = c("normalizedMatrix", "matrix")
mat2
Normalize  to :
  Upstream 5000 bp (100 windows)
  Downstream 5000 bp (100 windows)
  Not include target regions
  720 target regions

以上四个属性足以来绘制热图，还可以添加其它属性：

attr(mat2, "signal_name") = "H3K4me3"
attr(mat2, "target_name") = "TSS"
mat2
Normalize H3K4me3 to TSS:
  Upstream 5000 bp (100 windows)
  Downstream 5000 bp (100 windows)
  Not include target regions
  720 target regions

为了使转换更容易，用户可以直接使用 as.normalizedMatrix() 进行转换：

attributes(mat2) = NULL
dim(mat2) = dim(mat1)
as.normalizedMatrix(mat2,
    k_upstream = 100,
    k_downstream = 100,
    k_target = 0,
    extend = c(5000, 5000),
    signal_name = "H3K4me3",
    target_name = "TSS"
)
Normalize H3K4me3 to TSS:
  Upstream 5000 bp (100 windows)
  Downstream 5000 bp (100 windows)
  Not include target regions
  720 target regions